The aim of this project is to find out if portfolios made of 15 randomly selected stocks from the S&P500 Large-Cap Index can outperform stock-picking experts and index investing. The portfolios were optimized based on the mean-variance framework developed by Harry Markowitz for simplicity, with the goal of maximizing the ex-post Sharpe Ratio during the optimization period.
The project is meant to be a fun/thought experiment, and is not meant to disprove of index or active investing. It does not make use of any statistical methods to prove that random selection, expertise or index investing is a better choice.
library(PortfolioAnalytics) # For portfolio optimization and analysis
library(RColorBrewer) # For color palettes in plots
library(ROI) # For ROI solver in portfolio optimization
library(ROI.plugin.glpk) # Part of the ROI solver for linear optimization
library(ROI.plugin.quadprog) #Part of the ROI solver for quadratic optimization
library(tidyquant) # For quantmod and PerformanceAnalytics functions
library(tidyverse) # For dplyr and ggplot2 functions (data manipulation and plotting)The components of S&P500 can be obtained using
tidyquant::tq_index().
# Only require ticker and maybe name of company, which is columns 1 and 2 of query result
sp500 <- tidyquant::tq_index(x = "SP500")[,1:2]
sp500tickers <- sp500$symbolUsing quantmod::getSymbols(), I
retrieved the daily adjusted closing prices of the tickers from Yahoo
Finance, starting from 1 January 2015 to 1 July 2022.
startdate <- as.Date("2015-01-01")
enddate <- as.Date("2022-07-01")
prices <- NULL
for (t in tickers) { # Use for loop to get prices of all tickers
tmp <- NULL
tmp <- try( # Use try function to skip loop and return error message when price cannot be retrieved
expr = {quantmod::getSymbols(Symbols = t, src = "yahoo", auto.assign = FALSE,
from = startdate, to = enddate, periodicity = "daily")[,6]},
silent = TRUE
)
if(inherits(tmp, "try-error")) {cat("ERROR:", t, sep = " "); next}
names(tmp) <- t
prices <- cbind(prices, tmp)
}## ERROR: BRK.BERROR: BF.B
# Check dimension of object, start and end date of data collected
dim(prices); start(prices); end(prices)## [1] 1887 502
## [1] "2015-01-02"
## [1] "2022-06-30"
The download has failed for BRK.B (Berkshire Hathaway)
and BF.B (Brown-Forman Corporation) because the tickers in
Yahoo Finance are BRK-B and BF-B respectively.
I used the correct tickers to retrieve their data and added them to
prices. I also checked which stocks had NA over the
specified period and removed them from prices.
prices <- cbind(quantmod::getSymbols(Symbols = "BRK-B", src = "yahoo", auto.assign = F,
from = startdate, to = enddate, periodicity = "daily")[,6],
quantmod::getSymbols(Symbols = "BF-B", src = "yahoo", auto.assign = F,
from = startdate, to = enddate, periodicity = "daily")[,6],
prices)
colnames(prices)[1:2] <- c("BRK-B", "BF-B")
# Check which stocks have NA over the specified period
data.frame(rbind(which(colSums(is.na(prices)) > 0)), row.names = "Number of NAs")# Keep only stocks that do not have NAs over the specified period
prices <- prices[, which(colSums(is.na(prices)) == 0)]
dim(prices) # Left with 482 stocks## [1] 1887 482
15 tickers were sampled from prices without replacement
each time, which will be used to create 10 portfolios at the end. The
set.seed() allows the random sample to be
reproducible.
samp_stocks <- list()
for (i in 1:10) {
set.seed((50+i)^2)
samp_stocks[[i]] <- prices[, sample(x = ncol(prices), size = 15, replace = FALSE)]
names(samp_stocks)[i] <- paste("S", i, "_prices", sep = "")
}
# Check stocks sampled into each portfolio
data.frame(sapply(samp_stocks, colnames), row.names = 1:15)Discrete returns are used as it can be added across assets, unlike
log-returns. Returns can be calculated using
PerformanceAnalytics::Return.calculate()
but since samp_stocks is a list object, the function needs
to be used together with lapply().
stock_returns <- lapply(samp_stocks, function(x) {
na.omit(PerformanceAnalytics::Return.calculate(x, method = "discrete"))
})
names(stock_returns) <- paste("S", c(seq(1:10)), "_returns", sep = "")
data.frame(sapply(stock_returns, dim), row.names = c("Number of rows", "Number of columns"))# Show first 3 elements in stock_returns, first 5 columns and first 4 rows of data
lapply(stock_returns[1:2], function(x) {
head(x[,1:5], n = 4)
})## $S1_returns
## WY V AMGN NOC SCHW
## 2015-01-05 0.000000000 -0.022073753 -0.01188338 -0.021097814 -0.03342164
## 2015-01-06 -0.001107650 -0.006443594 -0.03221695 0.005510299 -0.03663123
## 2015-01-07 0.003049679 0.013398154 0.03492451 0.031631465 0.01954536
## 2015-01-08 0.010779294 0.013412803 -0.00360209 0.023197606 0.02614153
##
## $S2_returns
## DG MCD MOS SBAC CINF
## 2015-01-05 -0.006498952 -0.011044559 -0.01791585 -0.015769986 -0.014338451
## 2015-01-06 -0.012656687 0.001843226 0.01334830 -0.005218771 -0.007666568
## 2015-01-07 0.012098692 0.017424265 0.00526882 0.013713724 0.013668776
## 2015-01-08 -0.010815288 0.003723105 0.01179311 0.009987207 0.021301508
Using
PortfolioAnalytics::portfolio.spec(),
portfolio specification for the mean-variance framework can be created.
Because each portfolio contains different assets, we need to create 10
specifications for optimization.
portspecs <- list()
for (s in 1:10) {
# Initialize portfolio specification
portspecs[[s]] <- PortfolioAnalytics::portfolio.spec(assets = names(stock_returns[[s]]))
# Sum of weights constrained to 1, can also specify as type = "full investment"
portspecs[[s]] <- PortfolioAnalytics::add.constraint(portspecs[[s]],
type = "weight_sum",
min_sum= 1, max_sum = 1)
# Weight constraint on each stock, max is 15% of portfolio
portspecs[[s]] <- PortfolioAnalytics::add.constraint(portspecs[[s]],
type="box",
min=0, max=0.15)
# Objective to minimize risk based on variance (function will default to standard deviation as measure of risk)
portspecs[[s]] <- PortfolioAnalytics::add.objective(portspecs[[s]],
type = "risk",
name = "var")
# Adding a return objective, thus we maximize mean return per unit of risk
portspecs[[s]] <- PortfolioAnalytics::add.objective(portspecs[[s]],
type = "return",
name = "mean")
# Add name to specification to distinguish them
names(portspecs)[s] <- paste("P", s, "_spec", sep = "")
}
# Example of what portspecs contains
portspecs$P1_spec## **************************************************
## PortfolioAnalytics Portfolio Specification
## **************************************************
##
## Call:
## PortfolioAnalytics::portfolio.spec(assets = names(stock_returns[[s]]))
##
## Number of assets: 15
## Asset Names
## [1] "WY" "V" "AMGN" "NOC" "SCHW" "CPRT" "DFS" "MCHP" "BF-B" "CINF"
## More than 10 assets, only printing the first 10
##
## Constraints
## Enabled constraint types
## - weight_sum
## - box
##
## Objectives:
## Enabled objective names
## - var
## - mean
The portfolios are optimized with
PortfolioAnalytics::optimize.portfolio.rebalancing(),
which re-optimizes the portfolios every specified period. The main
arguments are rebalance_on,
training_period and
rolling_window. We can decide the
frequency of rebalancing (“months”, “quarters”, and “years”), the number
of periods used for optimization (specify an integer matching frequency
of data, in this case daily) and the number of periods to roll the
window used for optimization. I also added the argument
maxSR = TRUE to maximize the Sharpe Ratio
of the portfolios (the demo can be found here).
The function
PortfolioAnalytics::optimize.portfolio()
only allows for single-period optimization.
optports <- list()
for (p in 1:10) {
optports[[p]] <- PortfolioAnalytics::optimize.portfolio.rebalancing(R = stock_returns[[p]],
portfolio = portspecs[[p]],
optimize_method = "ROI",
rebalance_on = "quarters",
training_period = 252,
rolling_window = 125,
maxSR = TRUE)
names(optports)[p] <- paste("P", p, "_optimal", sep = "")
}
# Example of what optports contains
optports$P1_optimal## **************************************************
## PortfolioAnalytics Optimization with Rebalancing
## **************************************************
##
## Call:
## PortfolioAnalytics::optimize.portfolio.rebalancing(R = stock_returns[[p]],
## portfolio = portspecs[[p]], optimize_method = "ROI", maxSR = TRUE,
## rebalance_on = "quarters", training_period = 252, rolling_window = 125)
##
## Number of rebalancing dates: 26
## First rebalance date:
## [1] "2016-03-31"
## Last rebalance date:
## [1] "2022-06-30"
##
## Annualized Portfolio Rebalancing Return:
## [1] 0.164621
##
## Annualized Portfolio Standard Deviation:
## [1] 0.206573
I used 252 periods for training/optimization and rolled the window forward every 125 days, so the portfolios are re-optimized based on previous 1 year (252 trading days) data every half-year (125 trading days) and re-balanced every quarter.
Since the portfolio was re-optimized every half-year, the optimal weights selected using the mean-variance framework with a target of maximizing ex-post Sharpe Ratio should change over time. The weights of the Portfolio 1 are plotted below.
chart.Weights(object = optports[[1]],
colorset = c(RColorBrewer::brewer.pal(n = 7, name = "Dark2"), RColorBrewer::brewer.pal(n = 8, name = "Accent")),
main = "Portfolio 1 Component Weights")To calculate daily portfolio returns, the daily weights of the
portfolio are required and can be extracted from the optimal portfolios
using
PortfolioAnalytics::extractWeights().
port_weights <- lapply(optports, FUN = PortfolioAnalytics::extractWeights)
names(port_weights) <- paste("P", c(seq(1:10)), "_weights", sep = "")
# Show first 3 elements in stock_returns, first 5 columns and first 4 rows of data each
lapply(port_weights[1:3], function(x) {
head(x[,1:5], n = 4)
})## $P1_weights
## WY V AMGN NOC SCHW
## 2016-03-31 2.498051e-02 1.356383e-16 3.739019e-17 0.15 -7.108513e-17
## 2016-06-30 -1.280954e-17 5.705632e-17 -2.210002e-17 0.15 1.664727e-16
## 2016-09-30 1.288686e-16 -2.789032e-17 7.170911e-04 0.15 5.654058e-17
## 2016-12-30 -1.669187e-17 1.544942e-17 -1.372749e-17 0.00 1.500000e-01
##
## $P2_weights
## DG MCD MOS SBAC CINF
## 2016-03-31 1.500000e-01 1.500000e-01 -9.226609e-18 -1.203994e-18 1.500000e-01
## 2016-06-30 1.500000e-01 2.711126e-16 -4.044221e-18 -1.985114e-18 1.500000e-01
## 2016-09-30 -2.852867e-18 -5.927237e-18 -2.150322e-17 9.359765e-17 1.500000e-01
## 2016-12-30 1.678395e-17 3.777998e-02 -2.580449e-18 -4.229156e-17 2.581569e-18
##
## $P3_weights
## GS PAYX REGN LRCX CHRW
## 2016-03-31 -4.559622e-17 1.500000e-01 -6.344878e-18 0.1500000 1.500000e-01
## 2016-06-30 9.257308e-17 1.500000e-01 -2.345520e-19 0.0999999 1.500000e-01
## 2016-09-30 1.573552e-16 1.500000e-01 -6.485471e-17 0.1096971 -1.372825e-16
## 2016-12-30 1.500000e-01 -1.080069e-16 8.916761e-18 0.1500000 -9.512732e-17
PerformanceAnalytics::Return.portfolio()
can be used to calculate the daily portfolio returns.
port_returns <- NULL
for (r in 1:10) {
port_returns <- cbind(port_returns,
PerformanceAnalytics::Return.portfolio(R = stock_returns[[r]],
weights = port_weights[[r]],
geometric = TRUE))
names(port_returns)[r] <- paste("P", r, "_returns", sep = "")
}
data.frame(head(port_returns))Since the aim of this project was to determine if portfolios of random stocks could outperform stock-picking experts and index investing, relevant proxies for benchmarking are required.
The proxies are returns calculated from the Daily Adjusted Closing Prices of popular index or actively-managed Exchange-Traded Funds that have sizeable Total Assets Under Management screened from etfdb.com. The actively-managed ETFs were filtered to have an inception date before 01 January 2015.
The index ETFs chosen are:
The actively-managed ETFs chosen are:
index_tickers <- c("SPY", "VTI", "QQQ", "IWM", "IJR", "IJH")
index_prices <- NULL
for (i in index_tickers) {
index_prices <- cbind(index_prices,
quantmod::getSymbols(Symbols = i, src = "yahoo", auto.assign = FALSE,
from = startdate, to = enddate, periodicity = "daily")[,6])
}
colnames(index_prices) <- index_tickers
active_tickers <- c("ARKK", "ARKG", "EMLP", "ARKW", "ARKQ")
active_prices <- NULL
for (i in active_tickers) {
active_prices <- cbind(active_prices,
quantmod::getSymbols(Symbols = i, src = "yahoo", auto.assign = FALSE,
from = startdate, to = enddate, periodicity = "daily")[,6])
}
colnames(active_prices) <- active_tickersThe discrete method was also applied to the calculation of benchmark returns for comparison of the cumulative return performance.
index_returns <- na.omit(PerformanceAnalytics::Return.calculate(prices = index_prices, method = "discrete"))
data.frame(head(index_returns, 4))active_returns <- na.omit(PerformanceAnalytics::Return.calculate(prices = active_prices, method = "discrete"))
data.frame(head(active_returns, 4))PerformanceAnalytics::chart.CumReturns(R = cbind(port_returns[,1:5], index_returns),
geometric = TRUE,
main = "Cumulative Returns of Portfolios and Index ETFs",
colorset = RColorBrewer::brewer.pal(n = 11, name = "Spectral"),
plot.engine = "plotly")PerformanceAnalytics::chart.CumReturns(R = cbind(port_returns[,6:10], index_returns),
geometric = TRUE,
main = "Cumulative Returns of Portfolios and Index ETFs",
colorset = RColorBrewer::brewer.pal(n = 11, name = "Spectral"),
plot.engine = "plotly")QQQ ETF that tracks the NASDAQ-100
Index, which has a heavy focus on the technology sector, had higher
cumulative returns than the other Index ETFs. The only portfolio that
managed to outperform the QQQ ETF is Portfolio 2, which is
made up of
PerformanceAnalytics::chart.Drawdown(R = cbind(port_returns[,1:5], index_returns),
geometric = TRUE,
main = "Drawdowns of Portfolios and Index ETFs",
colorset = RColorBrewer::brewer.pal(n = 11, name = "Spectral"),
plot.engine = "plotly")PerformanceAnalytics::chart.Drawdown(R = cbind(port_returns[,6:10], index_returns),
geometric = TRUE,
main = "Drawdowns of Portfolios and Index ETFs",
colorset = RColorBrewer::brewer.pal(n = 11, name = "Spectral"),
plot.engine = "plotly")In general, the IWM, IJH and
IJR ETFs have larger drawdowns, since they track the
small-cap and mid-cap companies in the U.S., which tends to be more
volatile. The portfolios created from 10 randomly selected stocks had
similar drawdowns to the ETFs.
table.AnnualizedReturns(R = cbind(port_returns, index_returns), scale = 252, Rf = 0.025/252, geometric = TRUE, digits = 4)table.CAPM(Ra = port_returns, Rb = index_returns$SPY, scale = 252, Rf = 0.025/252, digits = 4)table.DownsideRisk(R = cbind(port_returns, index_returns), scale = 252, Rf = 0.025/252, MAR = 0.07/252, digits = 4)For the metrics that require risk-free rate (Rf), I chose an annual rate of 2.5%. For the Minimum Acceptable Return (MAR) in calculating Downside Deviation, I used an annual rate of 7%, which is approximately the long-run inflation-adjusted return of the S&P500. Portfolio 2 had the highest annualized return and Sharpe Ratio and the lowest maximum drawdown.
PerformanceAnalytics::chart.CumReturns(R = cbind(port_returns[,1:5], active_returns),
geometric = TRUE,
main = "Cumulative Returns of Portfolios and Active ETFs",
colorset = RColorBrewer::brewer.pal(n = 10, name = "Spectral"),
plot.engine = "plotly")PerformanceAnalytics::chart.CumReturns(R = cbind(port_returns[,6:10], active_returns),
geometric = TRUE,
main = "Cumulative Returns of Portfolios and Active ETFs",
colorset = RColorBrewer::brewer.pal(n = 10, name = "Spectral"),
plot.engine = "plotly")The ARK ETFs (ARKK, ARKG, ARKW, ARKQ) had high cumulative returns during the 2020 to 2021 period, but were the most volatile as well. Portfolio 2, which did the best against Index ETFs, had better cumulative returns in 2022 than the ARK ETFs.
PerformanceAnalytics::chart.Drawdown(R = cbind(port_returns[,1:5], active_returns),
geometric = TRUE,
main = "Drawdowns of Portfolios and Active ETFs",
colorset = RColorBrewer::brewer.pal(n = 11, name = "Spectral"),
plot.engine = "plotly")PerformanceAnalytics::chart.Drawdown(R = cbind(port_returns[,6:10], active_returns),
geometric = TRUE,
main = "Drawdowns of Portfolios and Active ETFs",
colorset = RColorBrewer::brewer.pal(n = 11, name = "Spectral"),
plot.engine = "plotly")The ARK ETFs had the largest drawdowns since 2021. In comparison, the
EMLP ETF and portfolios made of random stocks had lower
drawdowns.
table.AnnualizedReturns(R = cbind(port_returns, active_returns), scale = 252, Rf = 0.025/252, geometric = TRUE, digits = 4)table.CAPM(Ra = cbind(port_returns, active_returns), Rb = index_returns$SPY, scale = 252, Rf = 0.025/252, digits = 4)table.DownsideRisk(R = cbind(port_returns, active_returns), scale = 252, Rf = 0.025/252, MAR = 0.07/252, digits = 4)The benchmark returns (Rb) in
table.CAPM() used SPY returns to give a
comparison of the alpha and beta of the portfolios and actively-managed
ETFs. Again, Portfolio 2 had the highest annualized return and Sharpe
Ratio and the lowest maximum drawdown. The ARK ETFs would have
outperformed Portfolio 2 if an investor bought at the start of 2015 and
sold in early 2021 at its peak.
Some caveats about this experiment:
I would like to stress that this project was mainly carried out as a fun/thought experiment. It was not meant to disprove the expertise of active managers, the strategies employed by these funds for stock selection or the power of investing into an index ETF. From the results of this simple experiment, only 1 portfolio out of 10 (Portfolio 2) managed to achieve greater cumulative returns over the years than the rest of the pack. There would be an incredible amount of luck involved to select 15 stocks from the S&P500 universe of stocks that can provide such returns, if they were to be chosen at random.